Skip to content

release: v0.8.65 (provider/route + Fleet epics, hardening, version bump)#3544

Merged
Hmbown merged 22 commits into
mainfrom
codex/v0.8.65-release-prep
Jun 24, 2026
Merged

release: v0.8.65 (provider/route + Fleet epics, hardening, version bump)#3544
Hmbown merged 22 commits into
mainfrom
codex/v0.8.65-release-prep

Conversation

@Hmbown

@Hmbown Hmbown commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Summary

The v0.8.65 release — workspace bumped 0.8.64 → 0.8.65, with the provider/route + Fleet epic work landed alongside the release hardening. Every change was re-verified against current main and integrated commit-by-commit with the gate green at each step. Full notes in CHANGELOG.md [0.8.65]; overnight detail in scratchpad/v0.8.65-release-handoff-2026-06-24.md.

Epics advanced (real, tested)

Deferred to 0.8.66 (honest, with reasons — see handoff)

#3205 router de-hardcoding (high-risk runtime auto-routing), #3478 visible card re-anchor (needs ui.rs hot-path follow-up; proven infra on branch codex/issue-3478-…), #3075 model-picker catalog rows (now unblocked by #3385), #1519 arbitrary-named distinct identities, live-number verification for #2963/#2984 (need creds). #3494 dropped per maintainer.

Testing

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --locked -- -D warnings
  • cargo test — lib/protocol/cli/whaleflow/state + codewhale-tui --bins (5275 passed, 0 failed)
  • cargo build --release -p codewhale-cli -p codewhale-tui; codewhale --version0.8.65
  • ./scripts/release/check-versions.sh (0.8.65 consistent) · cargo audit clean · check-provider-registry.py · web lint/test/check-facts/build
  • Manual TUI QA (six-worker fanout, multi-terminal route isolation, queued steering) — per release-qa-sweep, needs a human run

Checklist

  • Updated docs/comments (CHANGELOG, README, AGENTS/CLAUDE)
  • Added/updated tests for every code slice
  • Verified TUI behavior manually — automated tests cover the slices; live-TUI QA pending
  • Co-author credit uses GitHub numeric noreply (no bot trailers; harvested community PRs handled separately — see handoff)

Does not tag, publish, or merge to main — those remain maintainer-gated.

🤖 Generated with Claude Code

Hmbown and others added 7 commits June 23, 2026 23:50
…op unused import)

Audit #2/#9/#10 (scratchpad/bug-audit-2026-06-24.md):
- Parse each KV digest entry independently so one malformed record can no
  longer blank the whole archive (#9).
- Accept locale params and localize empty-state + archive chrome for /zh (#10).
- Add a proper generateMetadata via buildPageMetadata (title/desc + metadataBase
  for the route) and drop the unused next/link import that tripped lint (#2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit #3 (scratchpad/bug-audit-2026-06-24.md): `npm run build` ran
derive-facts.mjs which always stamped a fresh generatedAt into the tracked
web/lib/facts.generated.ts, dirtying the working tree on every clean build.

Preserve the committed generatedAt when every *checked* fact (the same set the
drift gate compares, excluding generatedAt + runtime-only latestRelease) is
unchanged, so a clean rebuild leaves the tracked file byte-identical. A real
fact change still stamps a fresh timestamp.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit #4 (scratchpad/bug-audit-2026-06-24.md): Next build warned that
metadataBase was unset and fell back to http://localhost:3000 for social image
resolution on root-segment routes (/_not-found and the root opengraph-image),
which never inherited the per-locale layout's metadata.

Add a minimal root app/layout.tsx that supplies metadataBase for every route;
the per-locale <html>/<body>, fonts, and content metadata stay in
app/[locale]/layout.tsx. Build now emits no metadataBase warning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit #1 (scratchpad/bug-audit-2026-06-24.md): `cargo clippy --workspace
--all-targets --locked -- -D warnings` failed with 9 lints. Fixes:

- field_reassign_with_default: build ProvidersConfig with struct-update syntax
  in the deepseek_anthropic test helper (client.rs).
- needless_borrow / needless_borrows_for_generic_args: drop three borrows in
  session_picker + widgets composer tests.
- await_holding_lock: document why the env lock is intentionally held across
  the child-process await in the secret-env isolation test (js_execution.rs).
- print_stderr: localized allow for the test-only pandoc skip diagnostic, which
  trips the module-wide deny meant for prod code.
- too_many_arguments (x3): narrow, documented allows on the two SSE parsers
  (shared mutable parser-state set on the hot streaming path) and
  auto_review_plan_decision (mirrors AutoReviewContext::from_tool_call).

Gate now green; cargo fmt --check clean; touched tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit #8 (scratchpad/bug-audit-2026-06-24.md): the Fleet setup planner tracked
a selected role/model but Enter/g/G always inserted the same hard-coded
reviewer.toml authoring prompt, so the selection had no functional outcome.

Build the profile prompt on demand from the live selection: the Role lane picks
the profile file stem + role_hint, and the Model lane maps to model_class_hint
(fast/balanced/deep-reasoning/tool-heavy/inherit). Adds a regression test that
navigating to builder + fast yields builder.toml with the matching hints.

Audit #7 (modal i18n) is intentionally deferred to the #3167 interactive-picker
rework (documented in the module header) to avoid translating ~90 volatile
technical strings that the redesign will churn; CmdFleetDescription is already
localized and the selection wiring above is locale-independent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit #5 (scratchpad/bug-audit-2026-06-24.md): AGENTS.md and CLAUDE.md still
named codex/v0.8.63-integration, workspace 0.8.63, and milestone v0.8.63, which
rot between release lanes and mislead release work (README was already current
at 0.8.64).

Replace the hard-coded branch/version/milestone with live-truth guidance (read
version from Cargo.toml, confirm the lane from the active handoff + git, list the
milestone via gh). Harvests the framing from draft PR #3452 by @Hmbown while
KEEPING the guardrails that PR dropped (no-speculative spawn_blocking, sole base
prompt, agent-only sub-agent surface, approval gates, known-flaky papercuts).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit #11 (scratchpad/bug-audit-2026-06-24.md): cargo audit reported three
unmaintained-crate warnings (RUSTSEC-2024-0388 derivative, RUSTSEC-2025-0057
fxhash, RUSTSEC-2024-0436 paste), all transitive only via the starlark 0.13.0
family used by crates/tui and crates/whaleflow. None is a vulnerability and none
is fixable without an upstream starlark release.

Add .cargo/audit.toml ignoring exactly those three IDs with annotated rationale
and a removal trigger, so cargo audit stays clean and surfaces genuinely new
advisories. cargo audit now reports no warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@claude

claude Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 5m 10s —— View job


PR Review: fix(release): v0.8.65 release hardening (audit findings)

Status: Complete

Todo:

  • Gather context (PR metadata, changed files)
  • Review TUI changes (fleet_setup.rs, client.rs, chat.rs, engine.rs)
  • Review web changes (digest/page.tsx, layout.tsx, derive-facts.mjs)
  • Review build/config changes (.cargo/audit.toml, AGENTS.md, CLAUDE.md)
  • Compile final review summary

Summary

Solid release-hardening batch. The correctness fixes are on point, the clippy suppressions are properly justified, and the fleet-setup regression test covers the three key assertions. A few minor concerns below — none blocking.


crates/tui/src/tui/views/fleet_setup.rs — Fix #8 (main change)

Good: Switching from a stored profile_prompt: String (stale at construction) to a lazily-computed profile_prompt() -> String (reflects current selection) is the right design. Storing snapshot in the struct and recomputing on demand is clean.

Test: The new selected_role_and_model_class_drive_generated_profile test navigates correctly:

  • 3× Down in Role lane: manager(0)→main(1)→scout(2)→builder(3)
  • Right + 2× Down in Model lane: current-route(0)→inherit(1)→fast(2)

Minor — model_class_hint can't produce "code" or "review": The prompt string in profile_authoring_prompt lists these as valid model_class_hint values, but the match arm in model_class_hint() can't reach them (no Model lane row has a "code" or "review" label; they'd all fall through to "inherit"). Either add rows for these class hints or remove them from the schema doc string to avoid misleading the generating LLM.

// fleet_setup.rs ~L651 — "code" and "review" listed in the prompt but unreachable here
fn model_class_hint(label: &str) -> &'static str {
    match label {
        "fast" => "fast",
        "balanced" => "balanced",
        "strong" => "deep-reasoning",
        "deep-reasoning" => "deep-reasoning",
        "tool-heavy" => "tool-heavy",
        _ => "inherit",   // "code" / "review" fall here, never returned
    }
}

Fix this →

Nit — variable shadowing in selected_label: the closure param lane shadows the outer lane: usize parameter. Harmless, but a rename to fleet_lane would clarify the type change.

fn selected_label(&self, lane: usize) -> Option<&str> {
    self.lanes
        .get(lane)
        .and_then(|lane| lane.rows.get(row))  // `lane` is now FleetSetupLane, not usize
        ...
}

Provider/model safety — Clean ✓: model_class_hint only affects the TOML content being authored, not provider selection. The current route is explicitly passed as context-only ("Current route context only: provider = ...").


web/app/layout.tsx — Fix #4 (metadataBase)

Potential Next.js constraint: Next.js technically requires the root layout to include <html> and <body> tags. The current implementation returns children directly:

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return children;
}

This works in practice because app/[locale]/layout.tsx provides <html>/<body> for all app routes, and the build passes cleanly. However, it's a deviation from the documented requirement. If a new non-locale route is added at the root segment (e.g., app/some-page/page.tsx), it will render without a proper HTML document skeleton since the root layout doesn't provide one and the [locale] layout won't wrap it. Low risk given the current route structure, but worth noting for future additions.


web/scripts/derive-facts.mjs — Fix #3 (deterministic prebuild)

Logic is correct. The VOLATILE set mirrors the drift gate's ignore list, so the comparison is consistent with what CI checks.

Nit — regex fragility: The regex /export const FACTS\s*:\s*\w+\s*=\s*([\s\S]*?);?\s*$/ extracts the committed JSON object. The lazy [\s\S]*? combined with the optional ;? is correct for the current generated file format, but it would silently return null if the file ends with content after the FACTS declaration. Since the file is generated by the same script this is a low risk; the failure mode is safe (falls back to fresh timestamp). A minor hardening option would be to anchor more tightly (e.g., ([\s\S]+)\s*;\s*$ after the =), though not strictly necessary.


Clippy suppressions (client.rs, chat.rs, engine.rs, js_execution.rs, pandoc.rs)

All five suppressions are well-justified:

  • field_reassign_with_default in client.rs — correct struct-literal form
  • too_many_arguments on SSE hot paths and auto_review_plan_decision — struct bundling would add reborrow noise; comments explain the rationale
  • await_holding_lock in js_execution.rs test — intentional; the lock must span the await to prevent env races
  • print_stderr in pandoc.rs test helper — test-only skip, clearly scoped

No concerns here.


web/app/[locale]/digest/page.tsx — Fix #2

Per-entry try/catch with flatMap is the correct resilience pattern. The isZh binary locale check is appropriate for the current two-locale setup. The generateMetadata export is a clean addition.


.cargo/audit.toml — Fix #11

Suppressing three starlark-transitive "unmaintained" advisories is appropriate. The rationale and removal trigger are documented inline. The cargo tree -i verification claim in the comment is the right evidentiary standard.


AGENTS.md / CLAUDE.md — Fix #5

The de-hardcoding is the right call — "read from Cargo.toml" and "check the live milestone" are better instructions than a version number that will be stale by the next release cycle.


Minor correctness items in other TUI files

  • session_picker.rs / widgets/mod.rs — needless-borrow removals (&selected_idselected_id, contains(&...)contains(...)) are correct clippy fixes.
  • client.rs field-reassign-with-default — clean, no behavioral change.

Overall: The audit findings are addressed correctly. The one actionable item is the "code" / "review" gap in model_class_hint — the rest are nits or low-priority hardening notes. Ready to merge once the manual TUI QA pass (noted in the checklist) is complete.

Hmbown and others added 15 commits June 24, 2026 00:39
…2961)

Slice 1 (#3086): make the context report's context window route-aware and
collapse the duplicated pressure-threshold copy.
- ReportBuilder::finish now takes the provider + active route limits and uses
  route_budget::route_context_window_tokens, so a resolved route's context
  window overrides the bare model default in the report.
- build_context_report threads app.api_provider + app.active_route_limits; the
  headless doctor path uses provider_capability(...).context_window (route
  limits are unresolved headless) with an explanatory comment.
- pressure_label now delegates to context_budget::PressureLevel so the
  diagnostic label can no longer drift from the unified thresholds (the "high"
  boundary moves 70% -> 75%, matching the compaction trigger). The blanket
  dead_code allow in context_budget.rs is retained: only from_usage_percent and
  label gain a non-test consumer; ContextBudget and suggests_compaction are
  still pending their engine/TUI wiring.

Slice 2 (#2961, parser only): stop hardcoding Responses usage fields as None.
- parse_responses_usage now derives prompt_cache_miss_tokens as input minus the
  cached hit when cached input tokens are reported, and reads reasoning tokens
  from output_tokens_details.reasoning_tokens, mirroring the Chat-Completions
  parser.

Tests: route window overrides model default in the report; pressure label
matches PressureLevel boundaries; Responses usage surfaces cache-miss and
reasoning when present and stays None otherwise.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1519)

Light up real pricing on resolved route candidates and add a loopback-exempt
insecure-http advisory, both in the route resolver.

#3085 (pricing keystone): the resolver hardcoded
`Some(PricingSku::UnknownOrStale)` on every candidate because
`ProviderModelOffering` (the type the resolver consumes) carried no cost, and
`route_pricing_sku` takes a `CatalogOffering` the resolver never holds. Thread a
projected `PricingSku` onto `ProviderModelOffering`, populated where the sourced
cost is in scope:
- `CatalogOffering::to_offering` projects via `route_pricing_sku(self)`.
- The Models.dev offering builders project via a new
  `route_pricing_sku_from_cost` helper (same honesty rule, raw cost input).
- `bundled_offerings` (no sourced cost) stays `UnknownOrStale`.
The resolver now carries the matched offering's pricing onto the candidate and
keeps `UnknownOrStale` on every branch with no matched offering. No price is
ever fabricated (the #2608/#3085 honesty rule). `ProviderModelOffering` drops
its `Eq` derive (PricingSku::Token holds f64; still `PartialEq`); `PricingSku`
gains `PartialEq`.

#1519 (insecure-http warning): after the endpoint is built, push an advisory
"endpoint uses insecure http:// (credentials sent in plaintext)" message when
the base URL is non-loopback `http://`. Loopback (localhost / 127.0.0.0/8 /
::1) is exempt so local Ollama/vLLM/SGLang defaults stay clean. Advisory only:
`validation.ok` stays true. Host parsing is dependency-free (no url crate).

Tests: priced_offering_yields_token_pricing_sku, unpriced_offering_stays_unknown,
http_custom_endpoint_emits_insecure_warning, loopback_http_endpoint_does_not_warn,
https_endpoint_has_no_warning.
Route-consumption keystone, slices A + B. Surgical and byte-identical
for normal configs today; closes two provider-blind reasoning seams.

Slice A — construct the client FROM the candidate
- client.rs: add `DeepSeekClient::from_candidate(config, candidate)`
  beside `new`. Both now share a private `from_parts(base_url,
  default_model, config)` helper so they cannot drift. `from_candidate`
  overrides base_url <- candidate.endpoint.base_url and default_model <-
  candidate.wire_model_id; the API key and provider still come from
  `Config` because `ReadyRouteCandidate` is secret-free by design.
- Switch the three call sites that already hold a candidate:
  engine.rs `activate_runtime_route`, and ui.rs `switch_provider` +
  `apply_provider_fallback_switch`, all to `from_candidate(&cfg,
  &route.candidate)`. The candidate is a partial-move-safe field of the
  resolved route, still live after `route.config` is taken.

Slice B — fix two provider-blind reasoning seams
- model_routing.rs `resolve_explicit_route_with_inventory`: both arms
  resolved effort with a bare `ReasoningEffort::from_setting`, ignoring
  the candidate's provider. Wrap each with
  `normalize_auto_route_effort_for_provider(candidate.provider, ...)` so
  an explicit route to a non-active provider gets that provider's effort
  floor, not the active provider's raw setting.
- turn_loop.rs auto path: `resolve_auto_effort` applied the selected
  tier with a provider-blind `as_setting()`. Thread `self.api_provider`
  in and normalize via `normalize_auto_route_effort_for_provider`.

Tests
- client.rs: from_candidate_uses_candidate_base_url_and_wire_model and
  from_candidate_matches_new_when_config_agrees (candidates minted via
  the RouteResolver-backed resolve_runtime_route, the sole producer).
- model_routing.rs:
  explicit_route_to_nonactive_provider_uses_that_providers_effort
  (active=deepseek, explicit GLM-5.2 routes to Z.ai with effort
  normalized low -> high).

Deferred to 0.8.66 (explicit non-goals): threading the candidate into
MessageRequest/create_message_stream/turn-loop dispatch, adding a
reasoning field to the candidate, and model_inventory router hard-coding.
Persist an additive, plain-strings resolved-route snapshot on the Fleet
receipt so ledgers record which provider/model/protocol a task resolved
to. Closes the gap where ReadyRouteCandidate carried this detail but the
ledger dropped it.

Slice (#3154):
- Add `FleetResolvedRoute` to codewhale-protocol (provider_id,
  provider_kind, canonical_model, wire_model_id, protocol, role, loadout,
  source). Plain strings only — no codewhale-config route type dependency.
- Add `FleetReceipt.resolved_route: Option<FleetResolvedRoute>` behind
  `#[serde(default)]` so pre-existing ledgers still deserialize.
- Thread the route through `FleetTaskVerificationInput` and populate it in
  both receipt builders (task_spec verification path and the
  manager simulated/transport fallback) from a single mint per task.
- Mint via the existing hermetic resolver bridge
  (`route_runtime::resolve_route_candidate`) in `worker_runtime`, reusing
  the effective fleet role/loadout. `canonical_model` stays honest-None
  when the resolver cannot pin one; no reasoning/pricing fields invented.

No-secrets invariant (#3154):
- `FleetResolvedRoute` has no field that can hold a credential. Tests
  assert the serialized receipt/route contains no api_key/bearer/sk-*/
  auth-token/secret markers.

Assertion (#3166 scope #10):
- Extend the landed 10-task smoke to assert every receipt carries a
  resolved route with non-empty provider/wire_model_id, a role, and
  source == "resolver", plus a no-secrets scan over each serialized
  receipt.

Tests: round-trip, legacy back-compat (missing field), no-secrets, and
resolver-mint coverage. cargo test -p codewhale-protocol and the
codewhale-tui fleet suite pass; new code is clippy-clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two small provider-dashboard slices:

- #2984: add a typed `ProviderMaturity { Experimental, Supported }` marker,
  separate from `ProviderReadiness` (which tracks auth/route state). Seed it
  from a per-provider table (OpenaiCodex => Experimental, everything else
  Supported), carry it on `ProviderDashboardRow`, and surface an
  `experimental` tag in the compact hint only for experimental providers so
  the common case stays noise-free.

- #3083: add an `M`/`m` action in the provider picker that emits a new
  `ViewEvent::ProviderPickerOpenModels { provider }`. The ui.rs handler opens
  the `/model` picker via the existing path and pre-filters it to the
  highlighted provider by seeding the picker's search query with the
  provider display name (reusing the model picker's existing provider-name
  scoping; no model_picker internals touched). Footer hint gains `M models`.

Tests cover the maturity marker/tag for OpenaiCodex vs Deepseek and the
`m`/`M` key emitting ProviderPickerOpenModels for the highlighted provider.
…3385)

The default RouteResolver::new() previously sourced only the 4-row hand seam
(deepseek/together/openrouter), so the picker/candidates had real route facts
for almost nothing and fell back to RouteLimits::default() (unknown) for every
other provider/model. bundled_offerings_from_models_dev() existed but was fed
only by test fixtures — there was no committed catalog asset.

This commit:

- Adds crates/config/assets/models_dev.bundled.json, a network-free
  Models.dev-shaped snapshot matching the ModelsDevCatalog deserialization
  shape (models_dev.rs). It is CURATED from in-repo verified facts rather than
  a live models.dev dump: context windows / output caps come from
  crates/tui/src/models.rs and USD-per-million pricing from
  crates/tui/src/pricing.rs. The public models.dev catalog tracks a different
  real model generation than CodeWhale's curated forward-dated set, so a live
  transform would disagree with the repo's own model registry and tests;
  curated-but-accurate was preferred per the issue. Coverage: 13 providers
  (deepseek, zai, moonshot, minimax, openai, anthropic, openrouter, together,
  fireworks, novita, siliconflow, arcee, xiaomi-mimo), 27 chat offerings.
  Pricing is omitted where the repo has no trustworthy per-token rate
  (DeepSeek-native rows, aggregator-hosted DeepSeek, Anthropic, MiMo Token
  Plan) so it surfaces honestly rather than as a fabricated zero.

- Adds catalog loaders BUNDLED_MODELS_DEV_JSON (include_str!),
  bundled_models_dev_catalog(), and bundled_catalog_offerings().

- Wires RouteResolver::new() to merge the asset rows UNDER the hand seam:
  the seam keeps precedence on a (provider, wire id) collision so the curated
  canonical-model joins and the deliberately-unpriced DeepSeek-native entries
  the route invariants assert are preserved, while asset-only rows (GLM, Kimi,
  MiniMax, Qwen, …) now flow real context windows to candidates. Each
  provider's default row uses its built-in DEFAULT_*_MODEL wire id so the
  descriptor-conformance default-route test stays green.

bundled_offerings_from_models_dev keeps its signature. New tests: the asset
parses and deserializes; bundled offerings expose real chat facts; pricing is
honest; and RouteResolver::new() resolves GLM/Kimi to real (non-default)
context windows. cargo test -p codewhale-config and
cargo clippy -p codewhale-config --all-targets -- -D warnings are green.
Integration follow-up: #3385's default_resolver test asserted glm-5.1 pricing
stays UnknownOrStale (true on #3385's pre-#3085 base). On the release branch the
#3085 keystone projects the asset's provider-scoped cost onto the candidate via
route_pricing_sku, so the priced Z.ai row now carries a real PricingSku::Token.
Update the assertion to the integrated reality.
…llback (#2574)

#3386: Untangle mode cycling from permission policy. Introduce
`ModeSessionPrefs` (the durable Agent-era baseline that Plan/YOLO derive
from and restore to) and a pure `base_policy_for_mode(mode, prefs) ->
EffectiveModePolicy` implementing the mode table: Plan = read-only /
no-shell / Suggest, Agent = the baseline, YOLO = shell + trust + Auto.
`set_mode` now refreshes the baseline from the live mirrors when leaving
Agent, then derives and applies the incoming mode's policy in one block.
This subsumes the ad-hoc YoloRestoreState/PlanRestoreState snapshots so
YOLO's elevated authority can no longer bleed into the restored Agent
surface. The boolean fields (allow_shell/trust_mode/approval_mode/yolo)
stay as derived mirrors — no crate-wide type migration.

Behavior is preserved exactly: the shipped advisory review-only
behavior is untouched (no review-intent -> Plan downgrade), and the
existing yolo/plan/cycle round-trip tests stay green.

#2574: Make `advance_fallback` capability-aware. It now walks the chain
skipping providers that are not ready (hosted providers missing a key,
via the same `has_api_key_for` the picker uses) while local providers
(Ollama/vLLM/SGLang) are always eligible, appending a clear
"skipped <p>: needs auth" note per skip and a reason on exhaustion.
Readiness is captured into a per-provider snapshot at startup;
`ProviderChain::advance` stays pure. 401s already bypass fallback at the
call site, so a bad key still does not silently rotate providers.

Tests: base_policy_for_mode table; set_mode round-trips (Agent->YOLO->
Agent, Plan->YOLO->Agent, edited-baseline restore); advance_fallback
skip/land, all-unready exhaustion, and local-without-key eligibility.
Extract self-contained, behavior-preserving leaves out of the
~6.7k-line crates/tui/src/config.rs into sibling modules under
crates/tui/src/config/, each re-exported behind a `pub use` facade so
every existing `crate::config::<symbol>` path resolves unchanged. No
call site outside config.rs is touched; entangled credential/provider/
default-model/normalization/route logic stays put.

Modules created (config.rs: 6739 -> 6246 lines, ~493 lines moved):
- config/models.rs (160): provider model-name + base-URL constants and
  curated model lists (DEFAULT_*_MODEL/BASE_URL, RECENT_OPENROUTER_*,
  COMMON/OFFICIAL_DEEPSEEK_MODELS, ...). API_KEYRING_SENTINEL stays in
  config.rs with the credential logic.
- config/search.rs (135): SearchProvider, SearchProviderSource,
  SearchProviderResolution, SearchConfig (self-contained [search] types).
- config/subagent_limits.rs (69): sub-agent concurrency/timeout limit
  constants + their two private clamp resolvers (pulled back privately,
  no new external surface).
- config/paths.rs (203): pure filesystem path helpers (config/cache/
  workspace path resolution, env-var path overrides, ~ expansion);
  effective_home_dir/expand_path re-exported pub(crate). Workspace-trust
  and config-load logic stay in config.rs.

scripts/check-provider-registry.py now also scans config/models.rs for
the default model/base-URL constants so the registry gate follows the
split.

Verify: config suite 480 passed / 0 failed; full bin suite 5248 passed
/ 0 failed; cargo build -p codewhale-tui --locked green; clippy clean on
all changed files (the 3 remaining -D warnings errors are pre-existing
too_many_arguments lints in untouched chat.rs/engine.rs); cargo fmt
--all clean; provider registry drift check passes.
… pending live numbers)

Adds benchmark_results/deepseek-anthropic-comparison-2026-06-24.md for #2963.

The deepseek-anthropic / deepseek-claude route already landed (5b8a5ac /
#3449); this is the reporting deliverable, not code. The report:
- documents what landed (route, x-api-key + anthropic-version auth,
  AnthropicMessages wire format, body/SSE/usage parsing) with file:line cites;
- records code-derived findings without live calls: server tools / web search
  are filtered out on encode (anthropic.rs:361) so that capability is not
  exercised via this route today, and reasoning_tokens / server_tool_use are
  always null on the Anthropic usage path vs the Chat-Completions parser;
- gives the comparison methodology and a copy-pasteable live checklist for a
  human with DEEPSEEK_API_KEY to fill in latency/token/correctness numbers;
- states the decision honestly: keep as Experimental, keep-vs-promote PENDING
  the live numbers. No verified verdict is fabricated.
Reshape the front page toward the #3087 intended structure and ground every
claim in current repo facts.

- Identity line is now "the terminal coding agent for any model — open models
  first," matching docs/PROVIDERS.md and the agent guidance.
- New "Providers and routing" section describes the real route system:
  RouteResolver minting a resolved route (endpoint, wire protocol, model ID,
  context limit, price), the network-free Models.dev-shaped catalog,
  route-aware context budgets, and the honest cost states (per-token,
  subscription/quota, credits, local/N-A, unknown/stale).
- Fold sub-agents into a single durable "Fleet" section: ledger at
  .codewhale/fleet.jsonl, idempotent `fleet resume`, typed receipts, and
  roles/profiles/loadouts/slots with strong/balanced/fast model classes.
- Dedicated "Safety" section: three modes, hooks allow/deny/ask, the actual
  sandbox backends (Seatbelt/Landlock+seccomp/bwrap), and /restore.
- Fix stale facts: drop the "Kimi OAuth temporarily broken" note (a working
  kimi_oauth auth_mode ships), mark openai-codex experimental, replace the
  "big vs. cheap" tier wording with strong/balanced/fast, and add the
  qianfan provider that was missing from the list.
- Sync the docs index (Fleet/Sub-agents links) and trim the prose.

English README.md only; localized READMEs tracked as follow-up. Install and
version tokens (--tag v0.8.64, # 0.8.64) are left unchanged for the
check-versions gate.
Comprehensive 0.8.65 release notes covering the provider/route resolution epic
(#2608), Fleet substrate (#3154), config modularization (#3311), and the
release-hardening + correctness fixes integrated on this branch, plus the
v0.8.64..main community work. Compare link added; version bump applied
separately via prepare-release.sh.
…slice

Add a single dynamic provider identity for arbitrary
`[providers.<name>] kind="openai-compatible" base_url api_key_env`
tables, routing through the existing OpenAI Chat Completions wire
protocol + LocalOrCustom pass-through. Config-driven selection via
`provider = "<name>"`. Non-goals (deferred): visual picker integration,
per-provider distinct identities, non-openai-compatible kinds.

Touchpoints:
- config crate: ProviderKind::Custom variant + ALL[28] (provider_kind.rs);
  registry Custom entry (provider.rs); total ProviderKind matches +
  ProvidersToml.custom field (lib.rs); conformance env_vars carve-out
  (tests.rs); explicit-Custom pass-through resolver test (route/tests.rs).
- tui crate: ApiProvider::Custom variant + KIND/FROM_KIND lookups,
  ProviderConfig.kind/api_key_env + is_openai_compatible_custom(),
  ProvidersConfig flatten map + custom_provider_config(),
  api_provider() Custom-before-Deepseek safety fix (closes silent
  misroute), name-keyed provider_config_for[_mut], deepseek_base_url/
  default_model/credential_url/model_completion_names/passes_model_through
  Custom arms, deepseek_api_key api_key_env resolution + error, env/header
  override arms, merge_provider_config/merge_custom_providers (config.rs);
  route_runtime preserves the custom name + tests (route_runtime.rs);
  reasoning-effort Custom arms == Openai (client.rs) + from_candidate test;
  provider_base_url_table_key + picker auth Custom arms (config_persistence.rs,
  tui/ui.rs).

Tests: config flatten parse + api_provider Custom/Deepseek; route_runtime
custom endpoint verbatim model + ChatCompletions + insecure-http advisory;
resolver explicit-Custom pass-through; client from_candidate binds custom
base_url + model. Provider-descriptor conformance stays green over Custom.
… config split

Follow-ons surfaced by integrating #1519 + #3311:
- #1519 added the dynamic ApiProvider::Custom meta-provider. Exclude it from the
  website provider facts (facts-lib.mjs + facts-drift.ts EXCLUDED, like
  DeepseekCN) and from the provider-table drift check (META_PROVIDER_TABLES in
  check-provider-registry.py) — it's a runtime catch-all, not a shipped provider.
- #3311 moved DEFAULT_TEXT_MODEL's definition to config/models.rs, which broke
  the web facts' loose scrape (it mis-bound to an NVIDIA base URL). Read both
  files and match the const definition precisely; default model is deepseek-v4-pro
  again.
Bumps workspace + internal pins + npm wrapper + Cargo.lock + README install
tags + the packaged tui changelog slice + generated web facts to 0.8.65, and
finalizes the CHANGELOG [0.8.65] entry (dropped #3478, which is deferred). Via
scripts/release/prepare-release.sh; check-versions.sh green.
@claude

claude Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 4m 30s —— View job


Todo:

  • Gather PR context
  • Review diff (correctness, provider/route safety, reuse, tests, security)
  • Review TUI core changes (fleet_setup.rs, client.rs, engine.rs, app.rs)
  • Review config crate changes (catalog.rs, provider.rs, route/resolver.rs)
  • Review web/build changes (layout.tsx, derive-facts.mjs, digest/page.tsx)
  • Post final review

Review: fix(release): v0.8.65 release hardening (audit findings)

This review covers the full 22-commit diff, including the larger feature commits added since the first review pass.


crates/tui/src/fleet/worker_runtime.rsresolve_fleet_route (lines ~194–228)

Route snapshot always shows DeepSeek even for non-DeepSeek users. resolve_fleet_route passes ApiProvider::Deepseek as the hard-coded scope to resolve_route_candidate. The comment explains why ("worker profile carries no provider authority"), but the consequence is that FleetReceipt.resolved_route.provider_id will always serialize as "deepseek" regardless of the session's active provider (Zai, OpenRouter, custom, etc.). This is an audit/observability gap: the persisted receipt misreports the provider for every non-DeepSeek Fleet run.

The fix is straightforward — thread the active ApiProvider (or its ProviderKind) into resolve_fleet_route so it can pass the correct provider scope. Since the receipt is durable, wrong provider_id in existing ledgers won't be correctable without a migration.

// worker_runtime.rs ~L218 — provider_kind hardcoded to default()
let candidate =
    resolve_route_candidate(ApiProvider::Deepseek, model_selector, None, None).ok()?;
//                          ^^^^^^^^^^^^^^^^^^^^^^^^^^ wrong for Zai / OpenRouter / custom users

Fix this →


crates/tui/src/tui/views/fleet_setup.rsmodel_class_hint (carried from first review, still present)

"code" and "review" appear in the profile_authoring_prompt format string as valid model_class_hint values, but neither can be returned by model_class_hint() — they fall through to "inherit". The generating LLM sees these as options but the function constrains to a disjoint set.

// fleet_setup.rs ~L651 — "code" and "review" listed in schema but unreachable
fn model_class_hint(label: &str) -> &'static str {
    match label { "fast" => "fast", "balanced" => "balanced", ... _ => "inherit" }
}
// profile_authoring_prompt says: "one of inherit, fast, balanced, deep-reasoning, code, review, or tool-heavy"
//                                                                               ^^^^  ^^^^^^ unreachable

Fix this →


crates/config/src/route/resolver.rs — Provider/Route Safety ✓

The module header and RouteRequest struct are explicit: "no prompt-text / freeform field … the resolver cannot see prompt content, so it cannot silently route on it." The classify() function correctly separates strict-direct (DeepSeek/Zai), aggregator, and local/custom. The test suite in route/tests.rs (+259 lines) directly covers the no-prefix-inference invariant (resolver_no_explicit_provider_does_not_infer_deepseek_from_prefix, resolver_aggregator_preserves_prefixed_wire_id_without_inferring_deepseek). Clean.

api.deepseeki.com in normalize_route_base_url is the China regional endpoint (confirmed in REBRAND.md), not a typo.


crates/tui/src/client.rsfrom_candidate constructor (lines ~657–683) ✓

The design is correct: base_url and wire_model_id come from the resolver candidate; secrets (api_key, provider context) come from route_config, which resolve_runtime_route/prepared_route_config already stamps with the correct provider before from_candidate is called. The two inputs cannot drift because they share the same from_parts body.


crates/tui/src/tui/app.rs — Fallback chain tests ✓

The three new tests in app/tests.rs (advance_fallback_skips_unauthed_middle_provider, _local_provider_is_eligible_without_a_key, _all_unready_exhausts_with_clear_reason) cover the three meaningful cases well. One gap: no test for ApiProvider::Custom in the chain — it carries no built-in env var and auth is named per-entry, so the eligibility check is different from other providers.


crates/config/src/provider.rs — Custom provider (#1519) ✓

The Custom placeholder defaults (http://localhost/v1, "custom-model") fail closed: a misconfigured entry hits loopback rather than a public host. Auth is resolved from the per-entry api_key_env field in deepseek_api_key() (path 1b, ~L3246), keeping the secret out of config storage. The provider_config_key() returning "custom" is intentional; the note in prepared_route_config explains why the provider string must not be overwritten for the Custom variant.


Refactored crates/tui/src/config.rs

The monolith split into models, paths, search, subagent_limits leaf modules is purely mechanical: each leaf re-exports every public item, so crate::config::<CONST> paths are unchanged. No behavioral change. +248/-539 net reduction in the root file.


web/app/layout.tsx (Fix #4) — Low-risk deviation ✓

Returning children directly from the root layout is a deviation from Next.js's documented <html>/<body> requirement, but it works in practice because every reachable route goes through app/[locale]/layout.tsx. The risk noted in the first review stands: a future root-segment page (e.g. app/some-page/) won't get an HTML shell, but no such page exists today.


.cargo/audit.toml, AGENTS.md/CLAUDE.md, web derive-facts.mjs

All three already reviewed and signed off in the first review pass. No new concerns.


Summary

Two actionable items, one per priority tier:

# File Finding Priority
1 fleet/worker_runtime.rs:218 resolve_fleet_route hardcodes ApiProvider::Deepseek; receipt provider_id is wrong for all non-DeepSeek Fleet runs Correctness
2 tui/views/fleet_setup.rs:~651 "code" / "review" listed in schema docstring but unreachable from model_class_hint Correctness (carried from first review)

Everything else is clean: provider/route safety invariants are enforced and tested, the auth split in from_candidate is correct, the Custom provider fails closed, and the refactor is purely mechanical.

@Hmbown Hmbown changed the title fix(release): v0.8.65 release hardening (audit findings) release: v0.8.65 (provider/route + Fleet epics, hardening, version bump) Jun 24, 2026
@Hmbown Hmbown merged commit f1f7982 into main Jun 24, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant